Getting Started with Geospatial Analysis with Python, GeoJSON and GeoPandas

As a native New Yorker, I’d be a mess without Google Maps every single time I go anywhere else and things are suddenly not split by streets and avenues. We take these products for granted, but in reality, they’re an important point of convenience. Products like Google or Apple Maps are built on foundations of geospatial technology. At the center of these technologies are locations, their interactions and roles in a greater ecosystem of location services.

This field is referred to as Geospatial Analysis. Geospatial analysis applies statistical analysis to data that has geographical or geometrical components. In this tutorial, we’ll use Python to learn the basics of acquiring geospatial data, handling it, and visualizing it. More specifically, we’ll do some interactive visualizations of the United States!

Environment Setup

This guide was written in Python 3.6. If you haven't already, download Python and Pip. Next, you’ll need to install several packages that we’ll use throughout this tutorial. You can do this by opening terminal or command prompt on your operating system:

pip3 install shapely==1.5.17.post1
pip3 install geopandas==0.2.1
pip3 install geojsonio==0.0.3

Lastly, you can download all the data for this workshops here. And now we're ready to start!

Introduction

Data typically comes in the form of a few fundamental data types: strings, floats, integers, and booleans. Geospatial data, however, uses a different set of data types for its analyses. Using the shapely module, we’ll review what these different data types look like.

shapely has a class called geometry that contains different geometric objects. Using this module we'll import the needed data types:


In [ ]:
from shapely.geometry import Point, Polygon

The simplest data type in geospatial analysis is the Point data type. Points are objects representing a single location in a two-dimensional space, or simply put, XY coordinates. In Python, we use the point class with x and y as parameters to create a point object:


In [ ]:
p1 = Point(0,0)
print(p1)

Notice that when we print p1, the output is POINT (0 0). This indicated that the object returned isn't a built-in data type we'll see in Python. We can check this by asking Python to interpret whether or not the point is equivalent to the tuple (0, 0):


In [ ]:
print(p1 == (0,0))

This returns False, and the reason for that is its type. If we print the type of p1, we will get a shapely Point object:


In [ ]:
print(type(p1))

Next we have a Polygon, which is a two-dimensional surface that’s stored as a sequence of points that define the exterior. Because a polygon is composed of multiple points, the shapely polygon object takes a list of tuples as a parameter.


In [ ]:
polygon = Polygon([(0,0),(1,1),(1,0)])

Oddly enough, the shapely Polygon object will not take a list of shapely points as a parameter. If we incorrectly input a Point, we'll get an error message.

polygon = Polygon([Point(0,0), Point(1,1), Point(1,0)]) # Don't do this!

Data Structures

GeoJSON is a format for representing geographic objects. It's different from regular json because it supports geometry types, such as: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection.

Using geojson, making visualizations becomes suddenly easier, as you'll see in a later section. This is primarily because geojson allows us to store collections of geometric data types in one central structure.

GeoPandas is a Python module used to make working with geospatial data in python easier by extending the datatypes used by pandas to allow spatial operations on geometric types.

Typically, geopandas is abbreviated with gpd and is used to read geojson data intro a DataFrame. Below you can see that we've printed out five rows of a geojson DataFrame:


In [ ]:
import geopandas as gpd
states = gpd.read_file('states.geojson')
print(states.head())

Just as with regular json and pandas dataframes, geojson and geopandas have functions which allow you to easily convert one to the other. Using the example dataset from above, we can convert the DataFrame to a geojson object using the to_json() function:


In [ ]:
states = states.to_json()
print(states)

Being able to easily convert geojson from one format to another gives us more freedom as to what we can do with our data, whether that be analyzing it, visualizating, manipulating it.

Next we will review geojsonio, a tool used for visualizing geojson on the browser. Using the states dataset above, we'll visualize the United States as a series of Polygons.


In [ ]:
import geojsonio
geojsonio.display(states)

Once this code is run, a link is opened in the browser, displaying an interface as you see below:

On the left of the page, you can see that the geoJSON displayed and available for editing. If you zoom in and select a geometric object, you'll see that you also have the option to customize it:

And perhaps most importantly, geojsonio has multiple options for sharing your content. First, there's the option to share a link directly:

And to everyone's convenience the option to save to github, github gist, geojson, csvs, and various other formats gives developers plenty of flexibility when deciding how to share or host content.

In the example before we used geopandas to pass geoJSON to the display() function. If no manipulation on the geospatial needs to be performed, we can treat the file as any other and set its contents to a variable:


In [ ]:
contents = open('map.geojson').read()
print(contents)

Since JSON is technically a string, the format is still a suitable parameter for the display() function. Again, the main difference between using geopandas is whether or not any manipulation needs to be done!

This example is simply a point, so besides reading in the json, nothing necessarily has to be done, so we'll just pass in the string directly:


In [ ]:
import geojsonio
geojsonio.display(contents)

And once again, a link is opened in the browser and we have this beautiful visualization of a location in Manhattan.

And That’s a Wrap

That wraps up an introduction to GeoSpatial Analysis with Python. Most of these techniques are interchangeable in R, but Python is one of the best suitable languages for geospatial analysis. Its modules and tools are built with developers in mind, making the transition into geospatial analysis must easier.

In this tutorial, we visualized a map of the United States, as well as plotted a coordinate data point in Manhattan. There are multiple ways in which you can expand on these exercises -- state outlines are crucial to so many visualizations created to compare results between states.

Moving forward from this tutorial, not only can you create this sort of visualization, but you can combine the techniques we used to plot coordinates throughout multiple states. To learn more about geospatial analysis, check the resources below:

  • GeoJSON
  • OpenStreetMap
  • CartoDB

If you liked what you did here, follow me (@lesleyclovesyou) on Twitter for more content, data science ramblings, and most importantly, retweets of super cute puppies.


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:

Geospatial Data Visualization

GeoPandas is a python module used to make working with geospatial data in python easier by extending the datatypes used by pandas to allow spatial operations on geometric types. So now that we know what polygons are, we can set up a map of the United States using data of the coordinates that shape each state.

Shapely converts feature geometry into GeoJSON structure and contains tools for geometry manipulations. This module works with three of the types of geometric objects we discussed before: points, curves, and surfaces. Descartes works off of shapely for visualizing geometric objects!

First, we import the needed modules.


In [ ]:
from shapely.geometry import shape, LineString, Point
from descartes import PolygonPatch
import fiona
import matplotlib.pyplot as plt

These are some coordinates we'll need to plot the path of a flight from San Francisco to New York.


In [ ]:
latlons = [(37.766, -122.43), (39.239, -114.89), (38.820, -104.82), (38.039, -97.96),
    (38.940, -92.32), (39.156, -86.53), (40.749, -84.08), (41.494, -81.66),
    (42.325, -80.06), (41.767, -78.01), (41.395, -75.68), (40.625, -73.780)]

print(latlons)

This simply takes the points and reformats the x and y (since it's originally coordinates) and converts it to a LineString type.


In [ ]:
ls = LineString(latlons)

So now we want to display this on a map. Using some data I found online (which you can access via the github), I turn each of these into polygon with fiona.


In [ ]:
with fiona.collection("shapefiles/statesp020.shp") as features:
        states = [shape(f['geometry']) for f in features]

fig = plt.figure(figsize=(24, 12), dpi=180)

In order to use plotly, you have to own an account with your own API key. To find these, go into your plotly folder and type the following command into your terminal:

vim ~/.plotly/.credentials

It should be in the following format - the username and api_key fieds will be especially important.

{
    "username": "username",
    "stream_ids": [],
    "api_key": "api_key",
    "proxy_username": "",
    "proxy_password": ""    
}

And so here we begin. First, as always, we import the needed modules. From there, we initialize our session by signing in on plotly.


In [ ]:
import plotly.plotly as py
from plotly.graph_objs import *

py.sign_in('lc2958', 'xxxxx')

Plotly supports three types of maps - chloropeth, atlas maps, and satelite maps. Using data from the electoral college, we'll plot a map of the United States with a color scale. The darker the color, the greater number of votes.

So first, we load in the electoral college data.

Final Words

Most of these techniques are interchangeable in R, but Python is one of the best suitable languages for geospatial analysis. Its modules and tools are built with developers in mind, making the transition into geospatial analysis must easier.

Resources

GeoJSON
OpenStreetMap
CartoDB


In [ ]: